AITopics | successful experience

Collaborating Authors

successful experience

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

Yan, Renye, Gan, Yaozhong, Wu, You, Xing, Junliang, Liangn, Ling, Zhu, Yeshang, Cai, Yimao

arXiv.org Artificial IntelligenceOct-6-2024

A BSTRACT In sparse reward scenarios of reinforcement learning (RL), the memory mechanism provides promising shortcuts to policy optimization by reflecting on past experiences like humans. However, current memory-based RL methods simply store and reuse high-value policies, lacking a deeper refining and filtering of diverse past experiences and hence limiting the capability of memory. In this paper, we propose AdaMemento, an adaptive memory-enhanced RL framework. Instead of just memorizing positive past experiences, we design a memory-reflection module that exploits both positive and negative experiences by learning to predict known local optimal policies based on real-time states. To effectively gather informative trajectories for the memory, we further introduce a fine-grained intrinsic motivation paradigm, where nuances in similar states can be precisely distinguished to guide exploration. The exploitation of past experiences and exploration of new policies are then adaptively coordinated by ensemble learning to approach the global optimum. Furthermore, we theoretically prove the superiority of our new intrinsic motivation and ensemble mechanism. From 59 quantitative and visualization experiments, we confirm that AdaMemento can distinguish subtle states for better exploration and effectively exploiting past experiences in memory, achieving significant improvement over previous methods. However, in sparse reward environments, policy updates become unstable and ineffective due to insufficient feedback (Bellemare et al., 2016; Liang et al., 2018). This significantly increases the difficulty of learning effective long-horizon policies. Memory offers a promising solution to the sparse reward problem, as humans can effectively learn from past experiences to avoid repeating mistakes in similar scenarios (Liu et al., 2021; Bransford & Johnson, 1972; Andrychowicz et al., 2017). Through memory, agents can utilize prior successful experiences to refine their policies in complex environments, hence reducing the reliance on dense reward feedback and improving both learning efficiency and policy stability (Pathak et al., 2017). Existing memory-based RL methods can be roughly categorized into two classes.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2410.04498

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Cluster-based Sampling in Hindsight Experience Replay for Robotic Tasks (Student Abstract)

Kim, Taeyoung, Har, Dongsoo

arXiv.org Artificial IntelligenceJan-10-2024

In multi-goal reinforcement learning with a sparse binary reward, training agents is particularly challenging, due to a lack of successful experiences. To solve this problem, hindsight experience replay (HER) generates successful experiences even from unsuccessful ones. However, generating successful experiences from uniformly sampled ones is not an efficient process. In this paper, the impact of exploiting the property of achieved goals in generating successful experiences is investigated and a novel cluster-based sampling strategy is proposed. The proposed sampling strategy groups episodes with different achieved goals by using a cluster model and samples experiences in the manner of HER to create the training batch. The proposed method is validated by experiments with three robotic control tasks of the OpenAI Gym. The results of experiments demonstrate that the proposed method is substantially sample efficient and achieves better performance than baseline approaches.

cluster model, replay buffer, successful experience, (10 more...)

arXiv.org Artificial Intelligence

2208.14741

Country: Asia > South Korea > Daejeon > Daejeon (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Li, Chao

arXiv.org Artificial IntelligenceDec-7-2022

Deep reinforcement learning (DRL) provides a new way to generate robot control policy. However, the process of training control policy requires lengthy exploration, resulting in a low sample efficiency of reinforcement learning (RL) in real-world tasks. Both imitation learning (IL) and learning from demonstrations (LfD) improve the training process by using expert demonstrations, but imperfect expert demonstrations can mislead policy improvement. Offline to Online reinforcement learning requires a lot of offline data to initialize the policy, and distribution shift can easily lead to performance degradation during online fine-tuning. To solve the above problems, we propose a learning from demonstrations method named A-SILfD, which treats expert demonstrations as the agent's successful experiences and uses experiences to constrain policy improvement. Furthermore, we prevent performance degradation due to large estimation errors in the Q-function by the ensemble Q-functions. Our experiments show that A-SILfD can significantly improve sample efficiency using a small number of different quality expert demonstrations. In four Mujoco continuous control tasks, A-SILfD can significantly outperform baseline methods after 150,000 steps of online training and is not misled by imperfect expert demonstrations during training.

demonstration, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2212.03562

Country:

North America > United States > Massachusetts > Middlesex County > Reading (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Education > Educational Setting > Online (0.71)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback